<!DOCTYPE HTML>
<html>

<head>
    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-FZ8JLG3Z8G"></script>
    <script>
        window.dataLayer = window.dataLayer || [];
        function gtag() { dataLayer.push(arguments); }
        gtag('js', new Date());

        gtag('config', 'G-FZ8JLG3Z8G');
    </script>
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
    <link href="https://fonts.googleapis.com/css2?family=Source+Sans+3&display=swap" rel="stylesheet">

    <title>Cutie</title>

    <meta name="viewport" content="width=device-width, initial-scale=1">
    <!-- CSS only -->
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.1/dist/css/bootstrap.min.css" rel="stylesheet"
        integrity="sha384-+0n0xVW2eSR5OomGNYDnhzAbDsOXxcvSN1TPprVMTNDbiYZCxYbOOl7+AMvyTG2x" crossorigin="anonymous">
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>

    <link href="style.css" type="text/css" rel="stylesheet" media="screen,projection" />
</head>

<body>
    <br><br><br><br>
    <div class="container">
        <div class="row text-center" style="font-size:38px">
            <div class="col">
                Putting the Object Back into Video Object Segmentation
            </div>
        </div>

        <br>
        <div class="row text-center" style="font-size:28px">
            <div class="col">
                CVPR 2024
            </div>
        </div>
        <br>

        <div class="h-100 row text-center heavy justify-content-md-center" style="font-size:22px;">
            <div class="col-sm-auto px-lg-4">
                <a href="https://hkchengrex.github.io/">Ho Kei Cheng</a>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://sites.google.com/view/seoungwugoh/">Seoung Wug Oh</a></nobr>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://www.brianpricephd.com/">Brian Price</a></nobr>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://joonyoung-cv.github.io/">Joon-Young Lee</a></nobr>
            </div>
            <div class="col-sm-auto px-lg-4">
                <nobr><a href="https://www.alexander-schwing.de/">Alexander Schwing</a></nobr>
            </div>
        </div>

        <br>

        <div class="row text-center" style="font-size:28px">
            <div class="col">
                <a href="https://github.com/hkchengrex/Cutie">Interactive tool available!</a>
                <a href="https://github.com/hkchengrex/Cutie"><img width="80%" src="https://i.imgur.com/nqlYqTq.jpg"
                        alt="gui"></a>
            </div>
        </div>

        <br>

        <div class="h-100 row text-center justify-content-md-center" style="font-size:20px;">
            <div class="col-sm-2">
                <a href="https://arxiv.org/abs/2310.12982">[arXiv]</a>
            </div>
            <div class="col-sm-2">
                <a href="https://arxiv.org/pdf/2310.12982.pdf">[Paper]</a>
            </div>
            <div class="col-sm-2">
                <a href="https://github.com/hkchengrex/Cutie">[Code & Demo]</a>
            </div>
            <div class="col-sm-2">
                <a
                    href="https://colab.research.google.com/drive/1yo43XTbjxuWA7XgCUO9qxAi7wBI6HzvP?usp=sharing">[Colab]</a>
            </div>
        </div>

        <br>

        <hr>

        <div class="row" style="font-size:32px">
            <div class="col">
                Abstract
            </div>
        </div>
        <br>
        <div class="row">
            <div class="col">
                <p style="text-align: justify;">
                    We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which
                    puts the object representation from memory back into the video object segmentation result.
                    Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching
                    noise, especially in the presence of distractors, resulting in lower performance in more challenging
                    data.
                    In contrast, Cutie performs top-down object-level memory reading by adapting a small set of object
                    queries for restructuring and interacting with the bottom-up pixel features iteratively with a
                    <span class="strong">q</span>uery-based object <span class="strong">t</span>ransformer (<span
                        class="strong">qt</span>, hence Cutie).
                    The object queries act as a high-level summary of the target object, while high-resolution feature
                    maps are retained for accurate segmentation.
                    Together with foreground-background masked attention, Cutie cleanly separates the semantics of the
                    foreground object from the background.
                    On the challenging MOSE dataset, Cutie improves by 8.7 J&F over XMem with a similar running time
                    and improves by 4.2 J&F over DeAOT while running three times as fast.
                </p>
            </div>

            <img width="80%" src="https://i.imgur.com/k84c965.jpg" alt="overview">

        </div>

        <br>
        <hr>
        <br>

        <div class="row" style="font-size:32px">
            <div class="col">
                Demo video:
            </div>
        </div>
        <br>
        <center>
            <video style="width: 100%" controls>
                <source
                    src="https://user-images.githubusercontent.com/7107196/276778222-83a8abd5-369e-41a9-bb91-d9cc1289af70.mp4"
                    type="video/mp4">
                Your browser does not support the video tag.
            </video>
            <a href="https://raw.githubusercontent.com/hkchengrex/Cutie/main/docs/sources.txt">Video source</a>
        </center>

        <br><br>

        <div style="font-size: 14px;">
            Contact: Ho Kei (Rex) Cheng (hkchengrex@gmail.com)
            <br>
        </div>

        <br><br>

    </div>

</body>

</html>
